feat: enforce limit-ratio quality gate and bump 0.8.0#4
Merged
SummerOneTwo merged 2 commits intomasterfrom Apr 28, 2026
Merged
Conversation
Guarantee final generated tests prioritize limit-oriented coverage by requiring at least half type=3/4 cases by default, and verify this via manifest-backed quality checks with an explicit opt-out. Also synchronize workflow docs and plugin/package versions for the 0.8.0 release line. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR introduces a “limit-oriented coverage” quality gate by enforcing that the final generated test set contains at least 50% extreme/TLE cases (type=3/4) when possible, adds a manifest-backed verification check for the ratio (enabled by default with explicit opt-out), and bumps project/plugin/package versions to 0.8.0.
Changes:
- Update
problem_generate_testssampling to prioritize type=3/4 for at least half of final tests and emit a.autocode_tests_manifest.jsonmanifest plus ratio stats. - Add
problem_verify_testslimit_ratiocheck (default enabled; opt-out viaenable_limit_ratio=false) with unit tests for pass/fail/default/opt-out behavior. - Sync documentation/workflow guidance and bump version strings to
0.8.0.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Bumps editable package version to 0.8.0. |
pyproject.toml |
Updates project version to 0.8.0. |
src/autocode_mcp/__init__.py |
Updates __version__ to 0.8.0. |
.claude-plugin/plugin.json |
Bumps plugin manifest version to 0.8.0. |
tests/test_packaging.py |
Updates version assertion to 0.8.0. |
tests/test_plugin_manifest.py |
Updates plugin manifest version assertion to 0.8.0. |
src/autocode_mcp/tools/problem.py |
Enforces limit-case quota in sampling, writes test manifest, and returns new limit-ratio stats. |
src/autocode_mcp/tools/test_verify.py |
Adds manifest-backed limit_ratio verification with default enable + explicit opt-out. |
tests/test_tools/test_problem.py |
Adds tests for limit_ratio verification and the new sampling quota behavior. |
src/autocode_mcp/prompts/__init__.py |
Updates prompts to reflect the new sampling policy. |
README.md |
Documents the new generation policy and verification gate. |
skills/autocode-workflow/SKILL.md |
Updates workflow/quality gate documentation for the 50% extreme/TLE threshold. |
agents/autocode-workflow.md |
Updates agent instructions to enforce the quality requirement during test generation. |
scripts/workflow_guard.py |
Updates workflow guard messaging to reflect the new preference/requirement. |
CLAUDE.md |
Syncs workflow step documentation with the new quality threshold. |
CHANGELOG.md |
Adds 0.8.0 release notes describing the new gate and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review by matching schema wording with actual deterministic ordering and preventing unconditional signature-based de-duplication during final sampling, so enable_dedup=false semantics remain effective. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
problem_verify_testslimit_ratioquality gate (default enabled, explicit opt-out viaenable_limit_ratio=false), and add tests for pass/fail/default/opt-out behaviorTest plan
uv run pytest tests/ -quv run ruff check .uv run mypy src/claude plugin validate .